Model Selection

Efficient Deployment

# Efficient Deployment

Orpheus 3b 0.1 Ft Q4 K M GGUF

GGUF quantized version of Orpheus-3B-0.1-FT, suitable for efficient inference

Large Language Model English

Deepseek R1 Medical COT GGUF

DeepSeek-R1-Medical-COT is a Chain-of-Thought reasoning model specialized in the medical field, offering multiple quantized versions to accommodate different hardware requirements.

Large Language Model English

Qwen2.5 VL 7B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM

Transformers English

Deepseek R1 Distill Llama 70B FP8 Dynamic

The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.

Large Language Model

Molmo 7B D 0924 NF4

The 4Bit quantized version of Molmo-7B-D-0924, which reduces VRAM usage through the NF4 quantization strategy and is suitable for environments with limited VRAM.

Pixtral 12b FP8 Dynamic

pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.

Safetensors Supports Multiple Languages

QQQ Llama 3 8b G128

This is a version of the Llama-3-8b model quantized to INT4, using the QQQ quantization technique with a group size of 128 and optimized for hardware.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase